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ORIGINAL ARTICLE 

Data mining reveals a network of early-response genes as a 
consensus signature of drug-induced in vitro and in vivo toxicity 

JD Zhang, N Berntenis, A Roth and M Ebeling 

Gene signatures of drug-induced toxicity are of broad interest, but they are often identified from small-scale, single-time point 
experiments, and are therefore of limited applicability. To address this issue, we performed multivariate analysis of gene expression, 
cell-based assays, and histopathological data in the TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation 
system) database. Data mining highlights four genes — EGRl, ATF3, GDF15 and FGF21 — that are induced 2h after drug 
administration in human and rat primary hepatocytes poised to eventually undergo cytotoxicity-induced cell death. Modelling and 
simulation reveals that these early stress-response genes form a functional network with evolutionarily conserved structure and 
intrinsic dynamics. This is underlined by the fact that early induction of this network in vivo predicts drug-induced liver and kidney 
pathology with high accuracy. Our findings demonstrate the value of early gene-expression signatures in predicting and 
understanding compound-induced toxicity. The identified network can empower first-line tests that reduce animal use and costs of 
safety evaluation. 
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INTRODUCTION 

Drug toxicity and associated adverse events are critical issues in 
drug development: by a general estimate, 20-40% of novel drug 
candidates fail because of safety issues.^"^ Over the past 10 years, 
gene expression profiling has been introduced into drug 
development to predict and understand toxicity in pre-clinical 
settings, either as a stand-alone method"^"^ or integrated within 
systematic approaches.^"^^ 

In this work, we focus on the problem of identifying 
transcriptomics signatures that (a) predict toxicity prior to 
histopathological observations and (b) extrapolate between 
in vitro and in vivo settings and across species. Several studies 
have undertaken this task:^"^"^^ while making remarkable progress, 
they have illustrated two major challenges caused by limited data 
availability.^^ First, most studies have derived signatures from 
small-scale experiments with one or few compounds. Such 
signatures suffer both from limited statistical power and from 
limited translatability to other compounds. At the same time, it is 
difficult to interpret 'black-box' signatures, which are statistically 
significant but not associated with biological function. Second, 
most studies have sought late-time signatures (>24h). Such 
signatures are likely to be downstream effectors of signalling 
networks, which are compound specific and highly dependent on 
the specific cellular context and the availability of complex 
regulatory elements, as recently revealed by the ENCODE 
project.^ ^ In contrast, early-response genes (<24h) may be less 
context specific and more generic, because many of them are 
essential transcription factors or regulators of stress response,^° 
and they are closer to the core machineries of 'bow-tie'-like 
signalling networks.^^ 

It is not feasible to identify early-response toxicity signatures 
from large-scale experiments with current bottom-up practices.^^ 



Therefore, we took a top-down approach by analysing TG-GATEs 
(Toxicogenomics Project-Genomics Assisted Toxicity Evaluation 
system), a toxicogenomics database covering 170 compounds 
(Supplementary Table 1). The data comprise time-series gene 
expression, cell-based assay readouts and pathological records 
(Figure la). TG-GATEs is publicly available and is one of the most 
comprehensive data sets up to date.^'^^"^^ 

Here we report an ab initio analysis of TG-GATEs using a novel 
computational pipeline (Figure 1 b). We start by reporting data that 
support the generality of early-response genes. Next, we describe 
how integrative analysis of differential gene expression and 
cellular assay data revealed a consensus set of cytotoxicity 
signatures in human and rat primary hepatocytes. We provide 
evidence that the signatures form an evolutionarily conserved 
functional network that is responsible for early stress response. 
Finally, we confirm the network's predictive power for hepatic and 
renal pathology in a short-term (<30 days) study in vivo. 



MATERIALS AND METHODS 

Data pre-processing 

Raw data were downloaded from the TG-GATEs website (http://toxico. 
nibio.go.jp). Gene expression was profiled with Affymetrix Human Genome 
U133 PLUS 2.0 (Santa Clara, CA, USA) or Rat Genome 230 2.0 chips (Santa 
Clara, CA, USA). For each in vitro experiment, a Pico-Green fluorescence 
assay was performed, which quantitatively measures the total DNA content 
of cells. For in vivo experiments, histopathology in liver and kidney was 
assessed by pathologists (Supplementary Table 2). 

Expression data were pre-processed with the MASS method.^^ 
Differential gene expression profiles were determined by linear models 
using limma,^^ and expressed in logarithmic (base 2) fold changes (logFC). 
For in vitro samples, cytotoxicity was determined by the difference of total 
DNA content between compound-treated samples and the time-matched 
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control samples. Large reduction of DNA content indicates strong 
cytotoxicity. 

Identification of early-response gene signatures of cytotoxicity 
We merged all differential gene expression profiles of human samples, 
irrespective of compound, dose and time, into one matrix. Unsupervised 
clustering was performed and classified the samples into three groups. The 
statistical association between the groups and DNA content was 
determined by one-way analysis of variance. 

A cytotoxicity matrix was constructed for each compound by projecting 
the three sample groups (weak, moderate and strong, referring to 
corresponding cytotoxicities) onto a two-way matrix of dose and time. 
Gene signatures were derived from progressive profiles, where compound 
treatment causes weak, moderate and strong cytotoxicity at 2, 8 and 24 h, 
respectively. Hierarchical linear models^^ were fitted to capture early- 
response signatures (Supplementary Methods). 

We validated the signatures by analysing rat data with the same 
procedure, not involving any information from human data. 

Modelling and simulation with Boolean networks 
Inspired by a recently developed method, the Boolean Network 
Ensembles, which generates semiquantitative time-response simulations 
from the network structure,^^ we developed an algorithm that generates 
time-response profiles of expected node occupancy fractions given initial 
states (Supplementary Methods). Intuitively, the occupancy fraction of a 
node is the expected probability that the node will be ON. The time 
evolution of occupancy fractions simulates the dynamics of the network. 

Predicting liver and kidney pathology in vivo 
In vivo expression and histopathological records were randomly split into a 
training set (80%) and a test set (20%). Radial-kernel support vector 
machines (SVMs) were trained by 10-fold cross-validation using LIBSVM.^^ 
Optimal parameters of SVMs (C and gamma) were determined by grid 
search. Accuracy was measured by the Fi score, which combines precision 
and recall of predictions (Supplementary Methods). 

RESULTS 

Early-response genes are more generic than late-time induced 
genes 

We performed differential gene expression analysis to human 
gene expression profiles in TG-GATEs, by comparing compound- 
treated cells with the time-matched controls (Figure IbA). 

We investigated how global differential expression patterns 
change over time. In particular, we focused on the differentially 
expressed genes (DEGs) with |logFC|>0.5 and multiple-testing 
adjusted P<0.05 (Benjamini-Hochberg method). Notably, irre- 
spective of compound dose, there are generally fewer genes 
induced at 2h than at 24 h (Figure 2a). 

Given the number of DEGs and the number of DEG-inducing 
treatments at each time point, we asked whether DEGs are 
induced by similar numbers of treatments at three time points. To 
test this, we calculated the generality score for each DEG, 
representing how often it is induced by different treatments. 
DEGs with higher scores are more generic (less specific) than DEGs 
with lower scores, as they are induced by more treatments. A 
comparison of the scores shows that genes induced at 2 h are 
likely to be more generic than genes induced at 8 or 24 h 
(Figure 2b). A similar pattern was observed in the rat data 
(Supplementary Figure 1). 

Although there are generally fewer genes induced at 2 h, these 
genes tend to be more generic in the sense that they are 
modulated by multiple treatments. This observation guided us to 
focus on early-response genes as potential generic toxicity 
signatures in subsequent analyses. 

Early-response toxicity signatures identified in human hepatocytes 
To identify early toxicity signatures in vitro, we set out to 
determine which combinations of compound, dose and time 



cause cytotoxicity. We took a data-driven approach to address this 
question by integrating gene expression data with the results of 
DNA quantification assays (Figure IbB). Unsupervised clustering of 
differential expression profiles revealed that compound-treated 
samples are classified into three groups with distinct features 
(Figure 3a). Statistical analysis showed that the groups are 
significantly correlated with decreasing DNA content, or equiva- 
lently, with increasing cytotoxicities (Figure 3b). 

We assigned one of the three cytotoxicity levels (weak, 
moderate or strong) to each sample, which is associated with a 
unique combination of compound, dose and time. Subsequently, 
we built a cytotoxicity matrix for each compound, presenting how 
its cytotoxicity varies with dose and time (Figure 1 bC). Cytotoxicity 
matrices of all compounds tested in human primary hepatocytes 
are given in Supplementary Figure 2, and two of them are 
illustrated in Figure 3c as examples: vitamin A, which is nontoxic 
over the entire tested time and dose range, and nitrofurantoin, 
which is an antibiotic that showed increasing toxicity both along 
the dose gradient and with time. 

To identify early cytotoxicity signatures, which are detectable 
before the toxicity is visible at the molecular or phenotypic level, 
we focused on profiles of the progressive type, like the one 
described for nitrofurantoin, with weak, moderate and strong 
cytotoxicities at 2, 8 and 24 h, respectively (Figure IbD). We asked 
if there are genes that are robustly induced at 2 h in such profiles, 
arguing that they may be predictive signatures of the cytotoxicity 
outcome at 24 h (Figure IbE). 

Statistical analysis identified five genes that are significantly 
upregulated at 2h in progressive profiles: EGR1, GDF15, ATF3, 
FGF21 and IL8; and one gene that is downregulated: T0B2. We 
note that these genes are induced by compounds of diverse 
chemical and pharmacological properties (Table 1). Nevertheless 
the genes show consistent temporal expression patterns in 
progressive profiles. In contrast, no significant patterns were 
detected when their expression profiles were examined in 
treatments causing no or weak toxicity up to 24 h (Supplemen- 
tary Figure 3). 

Conserved early cytotoxicity signatures between rat and human 
The analysis procedure described above was then applied to data 
from rat primary hepatocytes. We emphasize again that no 
information obtained from the human data were carried over. 

Compound-induced expression profiles were classified into four 
groups in rat (Supplementary Figure 4). Three of them resemble 
the groups in human (weak, moderate and strong): they are 
associated with distinct expression profiles, and are significantly 
correlated with increasing cytotoxicities (Figure 4a). The fourth 
group only contains eight samples treated by either acetamino- 
phen (paracetamol) or phenobarbital, both non-genotoxic carci- 
nogens (NGTX) inducing tumour formation in rodent models.^^'^^ 
Samples in this group contrast strongly with other treatments as 
judged by their gene expression profiles (Supplementary 
Figure 4). They are associated with loss of total DNA, in line with 
previous observations.^^ 

We built cytotoxicity matrices for all compounds that were 
tested in rat primary hepatocytes (Supplementary Figure 5), and 
focused again on the progressive profiles. Comparing compound 
sets inducing progressive profiles in human and rat, we found 
only two overlapping compounds (Table 1 and Figure 4b), no 
more than the random expectation (P = 0.46 by bootstrapping). 
The substantial lack of overlap may reflect the distinct bioavail- 
ability and different modes of action of compounds in the two 
species. In contrast, the overlap of frequently induced genes 
(defined as DEGs that are induced in >5% of samples) is far 
stronger (81 genes; Figure 4b, P = 9.0E-6 by bootstrapping). 
The two comparisons suggest that although human and 
rat hepatocytes exhibit distinct temporal responses to 
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Figure 1. Study design of TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system) and the computational pipeline 
for data mining with TG-GATEs. (a) In TG-GATEs, 170 compounds are tested in up to three doses (including corresponding vehicle controls) in 
three systems: human primary hepatocytes, rat primary hepatocytes and rat in vivo. In in vitro experiments, compounds are administrated 
once at Oh (red triangle). Gene expression profiling and Pico-Green DNA quantification assays are performed at three time points (2, 8 and 
24 h). In in vivo experiments, compounds are dosed either once or repeatedly. Gene expression profiling and examination of pathology in liver 
and kidney take place at 3, 6, 9 and 24 h (single-dose) or at 4, 8, 1 5 and 29 days (repeated dose). Two biological replicates are available for most 
experiments, (b) The computational pipeline that integrates gene expression data, cell-assay readouts and pathology records to identify early 
signatures of in vitro and in vivo toxicity. Details are described in the Results section. 



toxicity-inducing compounds, the underlying mechanism is 
conserved to some extent. 

We applied statistical analysis to detect early-response genes 
from progressive profiles in rat. Notably, out of the six genes 
found in the human signatures, four orthologous genes were 
identified as early cytotoxicity signatures in rat: Egrl, AtfS, GdflS 
and Fgf21 (Figure 4c). Sequence analysis revealed high similarities 
between the orthologs (Supplementary Figure 6A). The two 



human genes that were not selected as signature genes in rat 
either do not have a rat ortholog (IL8, Supplementary Figure 6B) or 
just fail to achieve the predefined statistical significance {T0B2, 
Supplementary Figure 6C). 

The observation that four early-response genes show consistent 
compound-induced activation in human and rat hepatocytes 
suggests that they are likely to be evolutionarily conserved, func- 
tionally linked and intrinsic to the cell's stress-response system. 
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Figure 2. Across the TG-GATEs (Toxicogenomics Project-Genomics 
Assisted Toxicity Evaluation system) data set, there are fewer early- 
response genes but they are more generic than late-response genes, 
(a) Number of differentially expressed genes (DEGs) induced by 
compounds in human primary hepatocytes, stratified by dose and 
time. Numbers of DEGs induced by TG-GATEs compounds are 
represented in boxplots, with each blue dot representing one 
compound. Black dots indicate the median number of DEGs. (b) The 
generality score of DEGs stratified by dose and time. The score 
measures how often a DEG is induced by different treatments. DEGs 
with higher scores are more generic (less specific) than DEGs with 
lower scores. The scores are normalized to the mean value of DEGs 
at 24 h. Bar heights and error bars indicate the average score and the 
standard deviation, respectively. 



An early-response network with conserved structure and intrinsic 
dynamics 

Compound-induced expression changes of EGRl, GDF15, FGF21 
and ATF3 in vitro are summarized in Figure 5a. The side-by-side 
comparison reveals strikingly conserved dynamics in human and 
rat: the induction of all four genes at 2 h, followed by decay of 
EGRl and persistent activation of FGF2h GDF15 and ATF3. 

Literature search^"^""^^ allowed us to construct a functional 
network of the four genes (Figure 5b). The well-connected 
network is small but non-trivial: it has an auto-inhibition loop 
(EGRl), an auto-activation loop (ATF3), a negative feedback loop 
[EGRl and ATF3), and a feed-forward loop {EGRhATF3 and GDF15). 
Such components, known as network motifs, can encode dynamic 
behaviour of networks."^^ 

We hypothesized that the conserved dynamics is intrinsic to the 
network, and tested this with a Boolean network model 
(Figure 1 bG)."^"^'"^^ Boolean networks, as the name suggests, are 
defined as a set of nodes and edges. All nodes (representing, for 
example, genes, proteins or small molecules) are in either one of 
the two states: ON and OFF, and edges (interactions) between 
nodes are characterized as either activation or inliibition. Update 
functions change the states of nodes as a function of their 
incoming activating and inhibiting edges. Computational analysis 
of a Boolean network produces all the possible steady states of the 
network, namely the states of the network for which no further 
updates of node states are taking place, that is, when the network 
is in equilibrium. In addition, the Boolean network formalism is in 
principle able to simulate the dynamics of a network given the 
initial states of all network nodes. 

Given the network structure in Figure 5b, we identified two 
steady states (5):5i (EGR OFF, ATF3, GDF15, and FGF21 OA/) 
represents inactive (not induced) EGRl and active (induced) 
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ATF3, GDF15, and FGF21; $2 (all four genes OFF) represents all 
genes at the baseline level. Note that 5i matches the state of 
progressive profiles at 24 h (Figures 5a and b), and 52 matches the 
state of non-toxic treatments (Supplementary Figures 3A and B). 
The Boolean network model thus is in agreement with the two 
observed steady states. 

Next, we simulated the dynamics of the network with two 
different initial states (/5): /5i assumes all four nodes are initially 
ON, whereas /52 assumes only EGRl is ON and all the others are 
OFF (Figure 5c). The two alternatives were chosen because neither 
of them could be ruled out based on prior knowledge. 
Independent of the choice of initial states, simulation results 
match very well with observed dynamics: EGRl induction 
gradually decays and is finally turned OFF. The other three genes 
either stay ON [IS^) or are induced from OFF to ON [ISj). This 
implies the network dynamics is encoded by its structure, and the 
Boolean network is a useful model to study this network in silico. 

Subsequently, we tested the robustness of the network by 
deleting one edge a time and simulating the network dynamics 
with permutated networks. By iterating all single-edge mutations, 
we found that the deletion of any edge involving ATF3 (including 
the self-loop) dramatically alters the steady state and/or the 
dynamics of the network (Supplementary Figure 7). This implies 
that ATF3 is an essential gene of the network, and underscores 
the importance of ATF3 in the context of cellular response to 
compound-induced toxicity.^^'^^ 

Early expression changes of the network demonstrate predictive 
power for pathological outcomes in vivo 

Finally, we evaluated the predictive power of the early-response 
network for pathological outcomes in vivo (Figure IbH). To this 
end, we trained SVM, an established tool for binary prediction.^'' 

An SVM is constructed in two steps. First, it is trained with 
samples with binary labels (the training set). The SVM learns 
patterns from the data by finding the boundary (known as the 
hyperplane) in data space that best separates samples of two 
classes. Next, the SVM is used to predict labels for a new data set 
(the test set) and its accuracy is measured by comparing the 
predictions with true labels. 

SVMs used in this study take treatment-induced differential 
expression of Egrl, Atf3, Gdfl5 and Fgf21 at 3 h (the earliest time 
point in vivo) as input, and predict histopathological outcomes at 
all tested time points of the in vivo study. For each time point, one 
SVM was trained and tested for liver and kidney, respectively. We 
found that SVMs are able to predict pathological outcomes with 
high accuracies between 80 and 97% both in liver (Figure 6a) and 
in kidney (Figure 6b). This finding suggests that expression 
changes of the four early-response genes contain information that 
can predict short-term in vivo pathology. 

To test whether the network's predictive power is superior to 
that of single genes, we compared SVMs powered by the four- 
gene network against SVMs that use single genes as input (Figures 
6a and b). The network-based SVM substantially outperformed 
single-gene competitors, confirming synergy between the genes. 



DISCUSSION 

Large-scale databases such as the Drug Activity database of 
NCI-60 cell lines,"^^ Connectivity Map"^^ and ToxCast^° have been 
extensively explored to study compound-induced gene 
expression. However, few databases collect time series data, 
which is essential to identify early-response genes. We argue that 
such early-response genes can be valuable to understand and 
predict in vitro and in vivo toxicity. Our study presents a piece of 
evidence: although the four-gene network discussed here was 
identified in primary hepatocytes, it predicted liver and kidney 
pathology with good performance. Thus, by multivariate analysis 
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of time-series data, it nnay be possible to overcome one of the 
biggest hurdles for predictive in vitro toxicity, namely the lack of 
conserved markers between in vitro and in vivo and across species. 
However, we emphasize that the marker gene set identified here, 
whereas translatable from cellular systems to rat in vivo, allows 
only for a broad categorization of potential risk. This should guide 



researchers in prioritizing candidates out of compound series, 
including more specific measurements to identify the mode of 
action. 

Many parameters have been proposed as in vitro toxicity and 
tissue injury markers. Gerets et al.^^ identified a battery of six 
cytotoxicity assays to screen pharmaceutical compounds in 
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Table 1. Treatments that induced progressive profiles in human and rat primary hepatocytes 



Human 



Rat 



Compound 


Dose level 


Dose (fiM) 


Compound 


Dose level 


Dose (fiM) 


Acetaminophen 


H 


5000 


Caffeine 


H 


1000 


Allyl alcohol 


H 


70 


Carboplatin 


M/H 


600/3000 


Azathiopine 


H 


72.8 


Cephalothin 


H 


6 


Benzbromarone 


H 


100 


Chlorpheniramine 


H 


200 


Diazepam 


H 


250 


Cisplatin 


M/H 


40/200 


Diclofenac 


H 


400 


Colchicine 


L/M/H 


200/1000/5000 


Ethionine 


H 


600 


Diclofenac 


H 


400 


Flutamide 


H 


50 


Diethyl maleate 


H 


10000 


Ketoconazole 


H 


15 


Diltiazem 


H 


250 


Methapyrilene 


H 


600 


Disopyramide 


H 


400 


Naphthyl isothiocyanate 


H 


200 


Enalapril 


H 


2000 


Nitrofurantoin 


H 


125 


Ethambutol 


H 


1000 


Omeprazole 


H 


600 


Galactosamine 


H 


10000 


Phenobarbital 


H 


10000 


Gentamicin 


H 


30 mg ml"^ 


Propylthiouracil 


M/H 


800/4000 


Hydroxyzine 


H 


150 


TGF-betal 


L/M/H 


2/10/50 


Imipramine 


H 


100 








Isoniazid 


H 


10000 








Naphthyl isothiocyanate 


H 


200 








Papaverine 


H 


100 








Puromycin aminonucleoside 


L/M/H 


100/500/2500 








Quinidine 


H 


200 








Ranitidine 


H 


4000 








Sulindac 


H 


2000 








Sulpiride 


H 


2000 








Tacrine 


H 


200 








Theophylline 


H 


10000 








Valproic acid 


H 


10000 



Abbreviations: TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation system; TGF, Transforming growth factor. Two overlapping 
compounds between the species are displayed in bold. Dose level abbreviations follow the convention of TG-GATEs: H, high; M, middle; L, low. 
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two adjacent groups is snnaller than 0.001. (b) Venn diagranns connparing connpounds that induce progressive profiles in hunnan and rat, and 
connparing frequently induced genes in the two species, (c) Tennporal patterns of the four hunnan orthologs that are also signatures in rat. 
See caption of Figure 3d for legends. 



HepG2 cells using eight drugs. Recently, Bailey et al. evaluated 
34 acute rat toxicity studies and proposed three novel candidate 
genes {GSTA,ARG1 and HPD) in addition to the established ALT as 



drug-induced liver injury biomarkers in rats. Connplementary to 
these studies, we present here with the network of EGRl, ATF3, 
GDF15 and FGF21 a signature set detectable as early as 2h after 
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Figure 5. Dynannics of the early-response network is encoded by its structure, (a) Dynannics of the network in hunnan (left) and rat (right), 
showing their arithnnetic nnean (lines) and standard errors of the nnean (error bars), (b) The Boolean network nnodel. Four genes are connected 
with thick lines ennbodying binary interactions. Solid-arrow tip: positive trans-activation; solid-circle tip: indirect activation; bar: transcriptional 
inhibition. Biological functions are given in rectangular nodes, with genes connected to thenn with thin lines. Solid-circle tips indicate positive 
effects, and bars indicate negative effects. Only gene nodes and their interactions are used for nnodelling and sinnulation. (c) Sinnulated 
dynannics using the Boolean network nnodel with the initial state /Si (top) and IS2 (bottonn). Randonn noise is added to facilitate visualization. 
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Figure 6. Perfornnance (Fi-score) of support vector nnachines (SVMs) with four genes (network) as input and of SVMs with individual genes as 
input, in liver (a) and kidney (b). 



compound administration. We propose the network should be 
monitored in combination with established parameters for better 
toxicity prediction. 

Biological functions and biomarker potentials of EGRl, ATF3, 
GDF15 and FGF21 have been sporadically proposed for com- 
pound-induced in vitro and in vivo toxicity. EGR1, arguably 
the best-studied gene among them, is stimulated by many 



extracellular molecules. It links signalling cascades controlling 
cellular proliferation and apoptosis.^^ Bioinformatics and mRNA 
expression analyses showed Egrl mRNA activation is dependent 
on the activation of the Ras-Raf-Mek-Erk signalling pathway.^^'^^ 
Transforming growth factor-p, an important mediator of the 
cellular response to external stimuli and xenobiotics, activates 
EGRl via SMAD3.^^ Upregulation of EGRl was described as an 
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adaptive measure to attenuate sulindac sulphide-mediated 
cytotoxicity in human intestinal epithelial cells.^^ In vivo studies 
revealed that EGR1 is essential for ethanol-induced^^ or 
cholestatic^^ liver injury. Our study provides further evidences 
that early activation of EGRl, presumably an adaptive reaction 
against cell death, is a signature of compound-induced toxicity. Its 
expression alone, however, is not sufficient for prediction unless 
the states of the other genes in the network are known. 

ATF3, a direct transcriptional target of EGRl, acts as a central 
hub coordinating cellular stress pathways.'^^'^^ It has a key role in 
the transforming growth factor-^-SMAD pathway, providing a 
convergence point for the joint control of epithelial cells by multiple 
stress response pathways.^^ Therefore, it has been proposed as 
marker for a variety of stressed tissues including liver, heart, brain^^ 
and nerve.^^ Our findings underpin its role in stress response and 
highlight its synergy with other genes of the network. 

The other two members of the network both protect cells from 
apoptosis. GDF15, a transforming growth factor-p superfamily 
member, protects heart from ischemia/reperfusion injury.^"^ Its 
rapid induction was shown in various models of liver, lung and 
kidney injury .^^'^^ Serum levels of FGF21, a secreted protein, were 
recently proposed as a biomarker for liver and kidney 
diseases.^^'^^ 

Although their functions have been individually characterized, 
the hitherto undescribed coherent dynamics of the four genes 
induced by distinct compounds in human and rat suggest that 
they are more closely linked than previously thought. The network 
may act both as a convergence point of upstream stress-sensing 
pathways and as a core module coordinating downstream 
responses. Boolean network modelling supports the notion that 
the network's intrinsic dynamics is encoded by its structure. It 
presents a hypothesis to explain the conserved dynamics and the 
generality across compounds, organ types, and even species. We 
believe this hypothesis should be challenged by studies with other 
cell types, and studies in model organisms beyond rat and human. 

In conclusion, we report EGRl, ATF3, GDF15 and FGF21 as a 
consensus early signature of in vitro and in vivo toxicity in human 
and rat. It was essential to focus on early time points, because at 
this stage there seems to be high conservation of general stress- 
response signals, which diverge in later time points. Our findings 
demonstrate the translational value of multivariate time-series 
data in toxicity studies and the potential of early-response genes 
as predictive toxicity signatures. We recommend monitoring the 
network in first-line compound screenings to increase the efficiency 
of safety evaluations and to reduce costs and animal use. 
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